Model Selection

Video Text Understanding

# Video Text Understanding

ViCA2 is a multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.

Transformers English

Vica2 Stage2 Onevision Ft

ViCA2 is a 7B-parameter multimodal vision-language model focused on video understanding and visual-spatial cognition tasks.

Transformers English

Videochat R1 Thinking 7B

VideoChat-R1-thinking_7B is a multimodal model based on Qwen2.5-VL-7B-Instruct, focusing on video-text-to-text tasks.

Transformers English

A multimodal large language model developed based on the paper 'Task Preference Optimization: Improving Multimodal Large Language Models through Visual Task Alignment'

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase